Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #279 +/- ##
==========================================
- Coverage 83.03% 81.23% -1.80%
==========================================
Files 48 50 +2
Lines 6242 6662 +420
Branches 480 509 +29
==========================================
+ Hits 5183 5412 +229
- Misses 1059 1250 +191
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
ff71b2a to
ab02a28
Compare
Iterate over all interpreters in the linked list when determining the PID offset, fixing an issue where pystack only inspected the first interpreter, which is not guaranteed to be the main interpreter. This ensures the correct TID is located even when multiple subinterpreters are present. Add support for subinterpreters in pure Python stack reporting. Threads running in subinterpreters are now detected and grouped by interpreter ID. Native stack reporting for subinterpreters is not yet supported. Signed-off-by: Saul Cooperman <saulcoops@gmail.com>
6e47658 to
c46223d
Compare
…chors
When multiple subinterpreters execute on the same OS thread, each
PyThread previously received the full native stack for that TID. That
made native/Python merging fail because every thread in the group saw
the same set of eval frames, so n_eval did not match each thread's
entry-frame count.
This change makes native merging deterministic for same-TID
subinterpreter groups.
The game is played like this:
- Capture a per-thread stack anchor in the native layer:
- add Thread::StackAnchor() and d_stack_anchor.
- compute the anchor from the Python frame chain by walking backwards
to the nearest stack/shim-owned frame (FRAME_OWNED_BY_INTERPRETER /
FRAME_OWNED_BY_CSTACK on 3.14+, FRAME_OWNED_BY_CSTACK on 3.12/3.13).
- Thread construction now forwards this anchor into PyThread as stack_anchor.
- Switch process/core thread assembly from immediate yielding to collect-then-normalize.
- Group Python threads by tid when native mode is enabled.
- For groups with more than one thread:
- pick a canonical native stack,
- sort group members by stack_anchor (stable tie-breaker),
- partition eval-frame ownership according to each thread's Python entry-frame count,
- slice native frames accordingly per thread.
- If counts are inconsistent, keep existing behavior for that group and skip slicing.
55139f0 to
77a8d89
Compare
|
After chatting with @godlygeek when he was here in London I have devised a way that we can use to make Note: this still needs refinement Example: |
This seems to be true, but it also doesn't seem to actually matter. If I understand correctly, you're talking about this assignment failing when there's multiple interpreters: if (tid_offset_in_pthread_struct == 0) {
tid_offset_in_pthread_struct = findPthreadTidOffset(manager, addr);
}and logging out LOG(ERROR) << "Could not find tid offset in pthread structure";when it does. But, if (manager->versionIsAtLeast(3, 11)) {
the_tid = ts.getField(&py_thread_v::o_native_thread_id);
} else {
the_tid = inferTidFromPThreadStructure(manager, pthread_id);
}I'm pretty sure that getting support for multiple interpreters working on Python 3.10 and earlier is out of scope for us, so I think we should just skip this call entirely when diff --git a/src/pystack/_pystack/pythread.cpp b/src/pystack/_pystack/pythread.cpp
index ae496e9..f287fb4 100644
--- a/src/pystack/_pystack/pythread.cpp
+++ b/src/pystack/_pystack/pythread.cpp
@@ -418,7 +409,7 @@ getThreadFromInterpreterState(
const std::shared_ptr<const AbstractProcessManager>& manager,
remote_addr_t addr)
{
- if (tid_offset_in_pthread_struct == 0) {
+ if (tid_offset_in_pthread_struct == 0 && !manager->versionIsAtLeast(3, 11)) {
tid_offset_in_pthread_struct = findPthreadTidOffset(manager, addr);
}That gets rid of the warning without needing any of the changes to Let me know if you disagree with that assessment, @scopreon! |
| { | ||
| if (!manager->versionIsAtLeast(3, 7)) { | ||
| // No support for subinterpreters so the only interpreter is ID 0. | ||
| return 0; |
There was a problem hiding this comment.
Hm. This feels like the wrong way to handle this... It's technically possible to have multiple interpreters even for 3.6, though I don't think we'd support it particularly well if someone did it. But perhaps we should just use the address, or something like that, as a reasonable fallback? Maybe just
| return 0; | |
| return interpreter_addr; |
Though we might want the function to return a uint64_t instead of an int64_t in that case...
Or maybe we should just make up a synthetic ID, counting up from 0...
I guess this is fine for now, maybe we can just punt on it unless and until someone complains...
|
|
||
|
|
||
| cdef extern from "interpreter.h" namespace "pystack": | ||
| cdef cppclass InterpreterUtils: |
There was a problem hiding this comment.
| cdef cppclass InterpreterUtils: | |
| cdef cppclass InterpreterUtils: |
| Add support for subinterpreters when reporting pure Python stacks. Threads | ||
| running in subinterpreters are now identified and grouped by interpreter ID. | ||
| Native stack reporting for subinterpreters is not yet supported. |
There was a problem hiding this comment.
| Add support for subinterpreters when reporting pure Python stacks. Threads | |
| running in subinterpreters are now identified and grouped by interpreter ID. | |
| Native stack reporting for subinterpreters is not yet supported. | |
| PyStack now supports reporting stacks for processes using subinterpreters. |
| } | ||
| d_stack_anchor = getStackAnchor(manager, frame_addr); |
There was a problem hiding this comment.
This should be inside the if (frame_addr != (remote_addr_t) nullptr) { block, since we can't get a stack anchor for a frame we failed to get.
| if (!frame_addr) { | ||
| return 0; | ||
| } |
There was a problem hiding this comment.
This check won't be necessary once we move the getStackAnchor call into the if above.
| if (!frame_addr) { | |
| return 0; | |
| } |
| if len(threads) < 2: | ||
| return | ||
|
|
||
| canonical = next((thread for thread in threads if thread.native_frames), None) |
There was a problem hiding this comment.
I don't think I follow this at all. Won't each of the threads in a group have its own copy of exactly the same list of native frames? I understand how this could ever return anything but the first thread in the list (unless native_frames is an empty list for all of them, I guess)
| def _eval_frame_positions(thread: PyThread): | ||
| if not thread.python_version: | ||
| return [] | ||
| return [ | ||
| index | ||
| for index, native_frame in enumerate(thread.native_frames) | ||
| if frame_type(native_frame, thread.python_version) == NativeFrame.FrameType.EVAL | ||
| ] | ||
|
|
||
|
|
There was a problem hiding this comment.
Unused
| def _eval_frame_positions(thread: PyThread): | |
| if not thread.python_version: | |
| return [] | |
| return [ | |
| index | |
| for index, native_frame in enumerate(thread.native_frames) | |
| if frame_type(native_frame, thread.python_version) == NativeFrame.FrameType.EVAL | |
| ] |
| key=lambda item: ( | ||
| item[1].stack_anchor is None, | ||
| -(item[1].stack_anchor or 0), | ||
| item[0], | ||
| ), |
There was a problem hiding this comment.
This needs a comment so bad. Checking my understanding, this sorts:
- Everything with a stack anchor before everything without one
- Then as a tiebreaker sorts by stack anchor in descending order
- Then as a tiebreaker sorts by the index of the thread in the input list in ascending order
| cursor = 0 | ||
| for _, thread in ordered_threads: | ||
| required_eval_frames = _entry_frame_count(thread) | ||
| if required_eval_frames == 0: |
There was a problem hiding this comment.
How could this happen? If there is a Python stack, there will always be at least 1 entry frame, right?
I guess this could happen if we've acquired the GIL (so there's a thread state) but we don't yet have any Python stack for that thread state?
| eval_positions[group_end] | ||
| if group_end < len(eval_positions) | ||
| else len(canonical_frames) |
There was a problem hiding this comment.
Wait, how could group_end < len(eval_positions) be false, given that we checked above that sum(entry_counts) == len(eval_positions)? We know that things pair up correctly, no?
Issue number of the reported bug or feature request: #59,#280
Describe your changes
This PR adds simple
subinterpretersupport topystack. It preserves all original functionality but does it a few more times by traversing theinterpreterlinked list. It only supports pure python stacks currently.Added a new component
TracebackPrinterwhich should help with printing in future. Additional configuration for the printer can be added in future.Few changes in
version.h/cpp.First contribution so maybe missed something.
Testing performed
Added tests for
TracebackPrinter. Added additional end-to-end test forsubinterpreterssupport using the new concurrent 3.14 functionality.Additional context
Example output